NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech

Kebe, Gaoussou Youssouf; Richards, Luke E.; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia (February 2022, Conference on the Advancement of Artificial Intelligence (AAAI))

Learning to understand grounded language, which connects natural language to percepts, is a critical research area. Prior work in grounded language acquisition has focused primarily on textual inputs. In this work, we demonstrate the feasibility of performing grounded language acquisition on paired visual percepts and raw speech inputs. This will allow interactions in which language about novel tasks and environments is learned from end-users, reducing dependence on textual inputs and potentially mitigating the effects of demographic bias found in widely available speech recognition systems. We leverage recent work in self-supervised speech representation models and show that learned representations of speech can make language grounding systems more inclusive towards specific groups while maintaining or even increasing general performance.
more » « less
Full Text Available
Practical Cross-Modal Manifold Alignment for Robotic Grounded Language Learning

https://doi.org/10.1109/CVPRW53098.2021.00177

Nguyen, Andre T.; Richards, Luke E.; Kebe, Gaoussou Youssouf; Raff, Edward; Darvish, Kasra; Ferraro, Francis; Matuszek, Cynthia (June 2021, IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021)

We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Procrustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.
more » « less
Full Text Available
Towards Making Virtual Human-Robot Interaction a Reality

Higgins, Padraig; Kebe, Gaoussou Youssouf; Berlier, Adam; Darvish, Kasra; Engel, Don; Ferraro, Francis; Matuszek, Cynthia (March 2021, Proc. of the 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI))

For robots deployed in human-centric spaces, natural language promises an intuitive, natural interface. However, obtaining appropriate training data for grounded language in a variety of settings is a significant barrier. In this work, we describe using human-robot interactions in virtual reality to train a robot, combining fully simulated sensing and actuation with human interaction. We present the architecture of our simulator and our grounded language learning approach, then describe our intended initial experiments.
more » « less
Full Text Available

Search for: All records